rmse 1
A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation
Song, Xinran, Chen, Tianyu, Zhou, Mingyuan
Estimating individualized treatment effects from observational data is a central challenge in causal inference, largely due to covariate imbalance and confounding bias from non-randomized treatment assignment. While inverse probability weighting (IPW) is a well-established solution to this problem, its integration into modern deep learning frameworks remains limited. In this work, we propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation to enable accurate and fast causal estimation-including potential outcome prediction and treatment effect estimation. We demonstrate how IPW can be naturally incorporated into the distillation of pretrained diffusion models, and further introduce a randomization-based adjustment that eliminates the need to compute IPW explicitly-thereby simplifying computation and, more importantly, provably reducing the variance of gradient estimates. Empirical results show that IWDD achieves state-of-the-art out-of-sample prediction performance, with the highest win rates compared to other baselines, significantly improving causal estimation and supporting the development of individualized treatment strategies. We will release our PyTorch code for reproducibility and future research.
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
Stochastic Flow Matching for Resolving Small-Scale Physics
Fotiadis, Stathi, Brenowitz, Noah, Geffner, Tomas, Cohen, Yair, Pritchard, Michael, Vahdat, Arash, Mardani, Morteza
Conditioning diffusion and flow models have proven effective for super-resolving small-scale details in natural images. However, in physical sciences such as weather, super-resolving small-scale details poses significant challenges due to: (i) misalignment between input and output distributions (i.e., solutions to distinct partial differential equations (PDEs) follow different trajectories), (ii) multi-scale dynamics, deterministic dynamics at large scales vs. stochastic at small scales, and (iii) limited data, increasing the risk of overfitting. To address these challenges, we propose encoding the inputs to a latent base distribution that is closer to the target distribution, followed by flow matching to generate small-scale physics. The encoder captures the deterministic components, while flow matching adds stochastic small-scale details. To account for uncertainty in the deterministic part, we inject noise into the encoder's output using an adaptive noise scaling mechanism, which is dynamically adjusted based on maximum-likelihood estimates of the encoder's predictions. We conduct extensive experiments on both the realworld CWA weather dataset and the PDE-based Kolmogorov dataset, with the CWA task involving super-resolving the weather variables for the region of Taiwan from 25 km to 2 km scales. Our results show that the proposed stochastic flow matching (SFM) framework significantly outperforms existing methods such as conditional diffusion and flows. Resolving small-scale physics is crucial in many scientific applications (Wilby et al., 1998; Rampal et al., 2022; 2024). For instance, in the atmospheric sciences, accurately capturing small-scale dynamics is essential for local planning and disaster mitigation. The success of conditional diffusion models in super-resolving natural images and videos (Song et al., 2021; Batzolis et al., 2021; Hoogeboom et al., 2023) has recently been extended to super-resolving small-scale physics (Aich et al., 2024; Ling et al., 2024). However, this task faces significant challenges: (C1) Input and target data are often spatially misaligned due to differing PDE solutions operating at various resolutions, leading to divergent trajectories. Additionally, the input and target variables (channels) often represent different physical quantities, causing further misalignment. Few efforts have been made to directly address these challenges in generative learning. Prior work typically relies on residual learning approaches (Mardani et al., 2023; Zhao et al., 2021).
- Asia > Taiwan (0.25)
- Oceania > New Zealand (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression
Shabani, Sevda, Samadianfard, Saeed, Sattari, Mohammad Taghi, Shamshirband, Shahab, Mosavi, Amir, Kmet, Tibor, Varkonyi-Koczy, Annamaria R.
Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression, Nearest-Neighbor, Random Forest and Support Vector Regression were used to estimate the pan evaporation in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature, relative humidity, wind speed and sunny hours collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error, correlation coefficient and Mean Absolute Error. Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. We report that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
- Asia > Middle East > Iran > Golestan Province > Gorgan (0.47)
- Asia > China (0.05)
- Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)
- (14 more...)
Are Learned Molecular Representations Ready For Prime Time?
Yang, Kevin, Swanson, Kyle, Jin, Wengong, Coley, Connor, Eiden, Philipp, Gao, Hua, Guzman-Perez, Angel, Hopper, Timothy, Kelley, Brian, Mathea, Miriam, Palmer, Andrew, Settels, Volker, Jaakkola, Tommi, Jensen, Klavs, Barzilay, Regina
Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 15 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Germany (0.04)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area (0.73)
TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
Guo, Guibing (Nanyang Technological University) | Zhang, Jie (Nanyang Technological University) | Yorke-Smith, Neil (American University of Beirut and University of Cambridge)
Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.04)
Conquering the rating bound problem in neighborhood-based collaborative filtering: a function recovery approach
Huang, Junming, Cheng, Xue-Qi, Shen, Hua-Wei, Sun, Xiaoming, Zhou, Tao, Jin, Xiaolong
As an important tool for information filtering in the era of socialized web, recommender systems have witnessed rapid development in the last decade. As benefited from the better interpretability, neighborhood-based collaborative filtering techniques, such as item-based collaborative filtering adopted by Amazon, have gained a great success in many practical recommender systems. However, the neighborhood-based collaborative filtering method suffers from the rating bound problem, i.e., the rating on a target item that this method estimates is bounded by the observed ratings of its all neighboring items. Therefore, it cannot accurately estimate the unobserved rating on a target item, if its ground truth rating is actually higher (lower) than the highest (lowest) rating over all items in its neighborhood. In this paper, we address this problem by formalizing rating estimation as a task of recovering a scalar rating function. With a linearity assumption, we infer all the ratings by optimizing the low-order norm, e.g., the $l_1/2$-norm, of the second derivative of the target scalar function, while remaining its observed ratings unchanged. Experimental results on three real datasets, namely Douban, Goodreads and MovieLens, demonstrate that the proposed approach can well overcome the rating bound problem. Particularly, it can significantly improve the accuracy of rating estimation by 37% than the conventional neighborhood-based methods.
- North America > United States > New York > New York County > New York City (0.06)
- Asia > China > Beijing > Beijing (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)